logo

Introduction

This report collects freely available crime data for the Liverpool Local Authority District (LAD) in order to produce web maps. The geography of crime will be explored in addition to its temporal dynamics and whether the pattern of crime varies across months in 2018. An explanation for the observed patterns will refer to the findings of previous literature.

Code to produce figures is included inline and minimised (click the ‘Code’ buttons), data preprocessing is shown as an appendix. For full reproducibility the .Rmd and raw data are included.

Data Description

This report analyses crime data extracted from the Open Police Data API (Police UK 2019) for the 12 months of 2018. Data was provided with WGS 84 (GPS) coordinates and filtered to only include points with fall within the Liverpool Local Authority District (LAD). The data itself is obfuscated slightly with a spatial jitter, where actual crime locations are transferred to snap points to prevent a specific crime location from being identifiable (see Bridwell (2007); Figure 1). This process is outlined by Police UK (2019):

  • The location of each crime is compared against a master list of “snap points” to find the nearest.

  • The coordinates of the crime are then replaced with the coordinates of the snap point.

  • If the nearest snap point is over 20 km away, coordinates of zero are assigned so that the crime is not shown on the subsequent map.

# find duplicate points
crimePoints$duplicate <- duplicated(crimePoints$geometry)

# plot duplicate points
ggplot() +
  geom_sf(data = livSHP, alpha = 0.8, size = 0) +
  geom_sf(data = crimePoints[1:1000,], # subset for readability
          aes(color = duplicate, fill = duplicate))  +
  scale_color_manual(values=c("#999999", "#E69F00"), guide = F) +
  scale_fill_manual(values=c("#999999", "#E69F00"),
                    name = "",
                    guide = guide_legend(
                      direction = "horizontal",
                      keyheight = unit(5, units = "mm"),
                      keywidth = unit(70 / length(labels),
                                      units = "mm"),
            title.hjust = 0.5,
            label.hjust = 1,
            nrow = 1,
            byrow = T,
            label.position = "bottom")) +
  theme_map() +
  theme(legend.position = "bottom")
Crime reports sharing the exact coordinates (first 1000 points)

Figure 1: Crime reports sharing the exact coordinates (first 1000 points)

Shapefiles containing census geography were downloaded from the Consumer Data Research Centre (CDRC 2019), in this assessment primarily the largest geography has been utilised; ‘Middle Super Output Areas’ (MSOA), in an effort to reduce the inaccuracy introduced by snap points.

Figure 1 highlights the extent at which points are obfuscated, showing points that share exact coordinates (snap points), in total 92.5% of the 73318 total crime points used in this report share the same coordinates as one or more other point. For this reason it was considered more reliable to choose aggregation that is less likely to result in point inaccuracy through aggregation into an incorrect polygon, as suggested in Tompson et al. (2015).

crimePointsSP <- crimePoints %>% # change to use sp
  as_Spatial()
crimePointsSP <- crimePointsSP[1:2000,] # more managable

# create specific color scheme for each crime category
color_scheme <- viridis::cividis(n_distinct(crimePointsSP@data$category))
pal = colorFactor(color_scheme, crimePointsSP@data$category)

# leaflet heat map
crimePointsSP %>% 
  leaflet(width="100%") %>% 
  # add basemap 
  addProviderTiles(providers$CartoDB.Positron) %>%
  # leaflet circles
  addCircleMarkers(fillColor = ~pal(category), # use color scheme
                   stroke = FALSE,
                   fillOpacity = 0.8,
                   clusterOptions = markerClusterOptions(),
                   popup = ~as.character(category),
                   group = "Markers"
                   ) %>% 
  # heatmap from leaflet.extra
  addHeatmap(radius = 8,
             group = "Heatmap"
             ) %>%
  # corner minimap with toggle
  addMiniMap(tiles = providers$CartoDB.Positron,
             toggleDisplay = TRUE
             ) %>%
  # button to return to default zoom
  addEasyButton(easyButton(
    icon="fa-crosshairs", title="Reset Zoom",
    onClick=JS("function(btn, map){map.setZoom(11);}"))) %>%
  # layer control
  addLayersControl(
    overlayGroups = c("Heatmap", "Markers"),
    options = layersControlOptions(collapsed = FALSE)
  )

Figure 2: Interactive Heatmap showing a sample of crime from January 2018

Analysis

Heatmap

Figure 2 gives an overview of the typical distribution of crime within Liverpool, taking a small random sample of total crimes reported from 2018 (total crime becomes too hard to interpret). Both the heatmap and markers may be toggled off to allow for individual interpretation and the map in the bottom right gives context within the UK but may be minimised.

The basic trend appears to show that there is increased crime towards the city centre, this is typical of many cities, due to the higher concentration of people. In particular, Liverpool is widely considered a popular tourist destination (Shukla, Brown, and Harper 2006), with popular night-life, so it is to be expected that a disproportionate level of crime is focused towards these areas, often fuelled by alcohol consumption (Calafat et al. 2011). In addition there appears to be a divide between the southern end of the Liverpool LAD and the northern end, as crime reports are more sparse towards the south.

Choropleth Maps

Click the buttons to switch between figures.

Interactive Choropleth

# turn sf to sp
livSP <- crimePoly %>%
  as_Spatial()
# simplify geometry to reduce memory usage
livSimSP <- livSP %>% 
  gSimplify(tol=0.0005, topologyPreserve=F)

# re add @data
livSimSP <- SpatialPolygonsDataFrame(livSimSP, data=livSP@data)

# create quantile scale
qpal <- colorQuantile("magma", livSimSP$crime, n = 6, reverse = T)

# init map + prevent scrolling/dragging
map <- leaflet(width = "100%", livSimSP, 
               options = leafletOptions(preferCanvas = T,
                                        zoomControl = F,
                                        minZoom = 11,
                                        maxZoom = 11,
                                        dragging = F)) %>%
  addProviderTiles(providers$CartoDB.Positron) # basemap

# loop to create 12 groups toggleable
for (i in 1:length(months)) {
  
  month <- months[i]
  # find values for current month
  crimeMonth <- livSimSP[livSimSP$month == month,]
  
  labels <- sprintf(
  "<strong>%s</strong><br/>%g crimes",
  crimeMonth$MSOA11CD, crimeMonth$crime
  ) %>% lapply(htmltools::HTML)
  
  map <- map %>%
        addPolygons(
          fillColor = ~qpal(crimeMonth$crime),
          data = livSimSP,
          weight = 2,
          opacity = .5,
          color = "grey",
          fillOpacity = 0.7,
          label = labels,
          labelOptions = labelOptions( # hover labels
            style = list("font-weight" = "normal",
                         padding = "3px 8px"),
            textsize = "10px",
            direction = "auto"),
          highlight = highlightOptions( # highlight in white
            opacity = 1,
            weight = 5,
            color = "white",
            bringToFront = T
          ),
          group = month, # current month = group name
          )
}

map <- map %>%
  addLayersControl( # layers are toggleable
    baseGroups = months,
    options = layersControlOptions(collapsed = T)) %>%
  addLegend(pal = qpal,
            values = ~crimeMonth$crime,
            title = "Number of Crimes",
            position = "bottomright",
            # function gives values rather than percentage
            labFormat = function(type, cuts, p) {
    n = length(cuts)
    paste0(cuts[-n], " &ndash; ", cuts[-1])
    })

Figure 3: Interactive Choropleth with Month Selection

Video Choropleth

a <- ggplot() +
    geom_sf(data = crimePoly, aes(fill = crime_quantiles, group = month)) +
    # animate through months
    transition_states(month, transition_length = 1,
                      state_length = 1) +
    theme_map() + # custom function
    ## aesthetics only below
    theme(legend.position = "bottom") +
    labs(x = NULL, 
         y = NULL, 
         title = "Month: {closest_state}",
         subtitle = "MSOA Level") +
    scale_fill_viridis(
                       option = "magma",
                       name = "Number of Crimes",
                       discrete = T,
                       direction = -1,
                       guide = guide_legend(
            direction = "horizontal",
            keyheight = unit(2, units = "mm"),
            keywidth = unit(8, units = "mm"),
            title.position = 'top',
            title.hjust = 0.5,
            label.hjust = 1,
            nrow = 1,
            byrow = T,
            reverse = F,
            label.position = "bottom")
            )

# set animation options
animate(a, fps = 30,
        duration = 10,
        renderer = ffmpeg_renderer(
          options = list(pix_fmt = "yuv420p", loop = 0)))

anim_save("./figs/gganim1.mp4", animation = last_animation())
Choropleth showing variation in crime through months

Figure 4: Choropleth showing variation in crime through months

Figure 3 is another interactive map built with leaflet for R that aggregates the number of crimes into MSOAs and allows toggling between months (top right). Highlighting over an MSOA will display information regarding the number of crimes, and the name of that particular MSOA. This choropleth utilises MSOA scale aggregation of the crime points, with quantile bins which allows for an interpretable colour scale. Again, for each month, there is a concentration of very high crime focused towards the city centre, with a band of low crime towards the southern end of the LAD.

It is also possible to explore the variation in crime between months with the use of the new package gganimate (Pedersen 2019). Figure 4 is an mp4 video that shows the variation in crime between each month in 2018, sharing a scale also allows for direct comparison between months. The main benefit for the use of Figure 4 over Figure 3 is that the mp4 is much more lightweight, and much simpler to produce with fewer lines of code. I also find it to be more visually appealing at the cost of less interactivity.

Choropleth Analysis

Both figures show that interestingly, the most areas of very high crime (highest bin) appear in October/November but with relatively few in December (see Table 1). It should be noted that this does not imply higher levels of crime during these months, just that there are more MSOAs that have high levels of crime, i.e. crime is more spread out in the months running up to December. In these months, as many of these high crime MSOAs lie outside the city centre, it is likely that they do not represent necessarily the types of crime associated with the night-life culture.

# find num highest bin per month
highCrimeM <- crimePoly %>%
  subset(crime_quantiles == "722") %>%
  group_by(month) %>% 
  count(month)

# remove geometry
highCrimeM$geometry <- NULL 

# plot table
kable(highCrimeM,
      caption = "Total Number of MSOA with highest Crime bin by Month.",
      col.names = c('Month', 'Number of High Crime MSOA')) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = F, position = 'float_right')
Table 1: Total Number of MSOA with highest Crime bin by Month.
Month Number of High Crime MSOA
2018-01 10
2018-02 4
2018-03 8
2018-04 9
2018-05 14
2018-06 9
2018-07 13
2018-08 10
2018-09 11
2018-10 15
2018-11 13
2018-12 6

One issue with this aggregation is that the final bin using a quantile method gives a very large range of 124 - 720 crimes. This results due to two MSOA in particular, E02006932 and E02006932 (see Figure 3), both of which consistently have very high incidences of reported crime in relation to all other MSOAs. It has been suggested that areas which have displayed what are believed to be erroneous levels of crime can be attributed to reported crimes for which their location was unknown, leading to the location of the crime being reported as happening at the local police station (Tompson et al. 2015). However, choosing quantile binning, given the size of the final bin, the likely erroneous MSOAs do not significantly impact the visualisation, still allowing for interpretation.

Variation in Specific Crime

Figure 5 makes it clear that by far the two most dominant types of crime are ‘violent crime’, and ‘anti-social-behaviour’. Typically these types of crime come with the night-life and drinking culture that is associated with many city centres (Calafat et al. 2011). However, it could be said that these categories aggregate a range of crimes moreso than other categories that describe more specific crimes (Tompson et al. 2015). The general trend appears to show that both these types of crime are lower during the winter months, with higher reported crime during the summer. Again, this is likely attributed mostly due to the drinking culture typical of British cities, of which these crimes are typical. There is little recent literature that appears to confirm a seasonable variation in crime rates, however Lewis and Alford (1975) found lower crime rates do correlate with winter months, although do not attribute this to the physical attributes of the seasons. Instead it seems likely that tourism population, and a general populations willingness to participate in the nightlife culture may result in this observed variation between winter and summer months. Li et al. (2013) analysed venue popularity with Foursquare data and found typically venues are more popular in summer months.

There appears to be three defined groupings on Figure 5, the fewest crimes being theft and robbery. A medium grouping of drugs offences, shoplifting, burglary, criminal damage and vehicle crime, and the much higher grouping of violent crime and anti social behaviour. These groupings I believe largely reflect the level of aggregation in definitions of the type of crime above anything else, as suggested in Tompson et al. (2015), specifically they note that violent crime includes all assaults, and can include crimes ranging from murder to assault with no injury.

# find count of each category for each MSOA per month
crimeCount <- crime
  
crimeCount$geometry <- NULL # remove geom

##need to convert to a date for animation to work
crimeCount$month <- as.Date(paste0(crimeCount$month, "-01"))
# aggregate counts per category per month
crimeCount <- aggregate(cbind(crimeCount[0],count=1), crimeCount, length)
# shorten name lengths
crimeCount$category <- substr(crimeCount$category,1, 16)

# plot types of crime over months
a <- ggplot(crimeCount, aes(month, count, group = category)) + 
  geom_line() + 
  geom_segment(aes(xend = as.Date("2018-12-01"), yend = count), # dotted lines
               linetype = 2, colour = 'grey') + 
  geom_point(size = 2) + 
  geom_text(aes(x = as.Date("2018-12-10"), label = category),
            hjust = 0, size = 3, check_overlap = T) + # exlude when overlap
  # allow text to show after graph and set x labs
  scale_x_date(date_breaks = "2 month", date_labels = "%b",
             limits = as.Date(c('2018-01-01','2019-01-30'))) +
  # animate by month
  transition_reveal(month) +
## aesthetics
  theme_minimal() +
  theme(
    text = element_text(family = "Ubuntu Regular", color = "#22211d"),
    axis.title.x = element_blank(),
    axis.title.y = element_blank(),
    panel.grid.minor = element_line(color = "#ebebe5", size = 0.2),
    panel.grid.major = element_line(color = "#ebebe5", size = 0.2),
    plot.background = element_rect(fill = "#f5f5f2", color = NA), 
    panel.background = element_rect(fill = "#f5f5f2", color = NA), 
    legend.background = element_rect(fill = "#f5f5f2", color = NA),
    panel.border = element_blank()
       )

# animation settings
animate(a, fps = 30,
        duration = 10,
        renderer = ffmpeg_renderer(
          options = list(pix_fmt = "yuv420p", loop = 0)))

anim_save("./figs/gganim2.mp4", animation = last_animation())
Variation in Types of Crime Across 2018

Figure 5: Variation in Types of Crime Across 2018

Conclusion

New open crime data can be used to give insight into spatial and temporal trends, and categorised crime allows for an in depth discussion of the culture that influences patterns in crime, particularly regarding the night-life and drinking culture in many UK cities.

However, care should be taken when interpreting the aggregation of certain crimes, and the wide variation in reported crimes for each category suggests there is likely some level of ambiguity and potential for error in some instances, e.g. is ‘theft from the person’ sometimes reported as ‘violent crime’? Some are unambiguous, ‘burglary’ and ‘shoplifting’ for example, and likely provide a more accurate representation of the true levels of crime. The ‘snap points’ employed to obfuscate exact crime locations are likely to cause inaccuracy when looking at fine level aggregation, although, given there appears to be a reasonably large number of snap points (Figure 1), this is likely minimised.

References

Bridwell, Scott A. 2007. “The dimensions of locational privacy.” In, 88:209–25. Dordrecht: Springer Netherlands. https://doi.org/10.1007/1-4020-5427-0_14.

Calafat, Amador, Nicole Blay, Mark Bellis, Karen Hughes, Anna Kokkevi, Fernando Mendes, Barbara Cibin, et al. 2011. Tourism, nightlife and violence: a cross cultural analysis and prevention recommendations. Edited by IREFREA. September 2014. IREFREA. http://www.irefrea.eu/uploads/PDF/Calafatetal{\_}2010.pdf.

CDRC. 2019. “Consumer Data Research Centre.” https://www.cdrc.ac.uk/.

Lewis, L T, and J J Alford. 1975. “The Influence of Season on Assault.” The Professional Geographer 27 (2): 214–17. https://doi.org/10.1111/j.0033-0124.1975.00214.x.

Li, Yanhua, Moritz Steiner, Limin Wang, Zhi Li Zhang, and Jie Bao. 2013. “Exploring venue popularity in Foursquare.” Proceedings - IEEE INFOCOM, 3357–62. https://doi.org/10.1109/INFCOM.2013.6567164.

Pedersen, Thomas. 2019. gganimate: A Grammar of Animated Graphics. http://github.com/thomasp85/gganimate.

Police UK. 2019. “Data.police.uk.” https://data.police.uk/.

Shukla, Paurav, Janice Brown, and Donna Harper. 2006. “Image association and European capital of culture: Empirical insights through the case study of Liverpool.” Tourism Review 61 (4): 6–12. https://doi.org/10.1108/eb058481.

Tompson, Lisa, Shane Johnson, Matthew Ashby, Chloe Perkins, and Phillip Edwards. 2015. “UK open source crime data: accuracy and possibilities for research.” Cartography and Geographic Information Science 42 (2): 97–111. https://doi.org/10.1080/15230406.2014.972456.

Appendix I: Preprocessing

Default Chunk Options

# set default chunk options
knitr::opts_chunk$set(echo=F, message = F, cache = F) # careful with cache

Required Packages

library(tidyverse) # data wrangling
library(jsonlite) # convert json to dataframe
library(sf) # new spatial tools
library(rgdal) # additional spatial tools
library(tmap) # map visualisations
library(leaflet) # interactive maps
library(leaflet.extras) # extra leaflet functions 
library(gganimate) # animated ggplots
library(viridis) # colour schemes
library(kableExtra) # table formatting
library(rgeos) # spatial tools

# wd to source location
setwd("/home/cjber/Dropbox/uni/ENVS456/a1/")

Sequence of Months

# create list of 12 months in 2018
months <- seq(as.Date("2018-01-01"),
              as.Date("2018-12-01"),
              by="months") %>% 
  substr(1,7) # remove date

Read in MSOAs

# sf read in liv MSOA shapefile as mercator
livSHP <- st_read("./shapefiles/Liverpool_msoa11.shp",
                  quiet=T) %>% 
  st_transform(4326)

Find Coordinates for Liverpool Crime

# create vector of liv bbox
livRegion <- st_bbox(livSHP) %>% 
  as.vector()

index <- c(2,1,4,3) # specify new col order (lat lon)
livRegion <- livRegion[order(index)] # reorder the columns

index <- c(1,4,3,2) # swap lat and lon order (four coords of square box)
livRegion <- c(livRegion, livRegion[order(index)])

# combine each lat and lon with comma sep
livRegion[1] <- paste(livRegion[1:2], collapse=",")
livRegion[2] <- paste(livRegion[3:4], collapse=",")
livRegion[3] <- paste(livRegion[5:6], collapse=",")
livRegion[4] <- paste(livRegion[7:8], collapse=",")


index <- c(4,2,3,1) # order for square coords matters (clockwise)
livRegion <- livRegion[order(index)] # reorder the columns


# remove last 5 items and collapse all with : seperator
livRegion <- paste(livRegion[1:4], collapse=":") %>% 
  toString()

For Loop: Police API

# base police API url
url <- c("https://data.police.uk/api/crimes-street/all-crime")

# create empty data.frame
df <- data.frame()

# loop over each month in list into api url and combine to data.frame
for(i in 1:length(months))
{
    month <- months[i] # iterate through months
    results <- fromJSON(paste0(url,"?poly=",livRegion,"&date=",month),
                        flatten=T) # combine url
    df <- rbind(df, results) # append all to df
}

# retain useful cols
df <- df[,c("category", "month", "location.latitude",
            "location.longitude")]

# write to csv to avoid running loop again
write.csv(df, file = "./data/crime.csv")

Read in Crime Data

# read saved csv
crime <- read.csv("./data/crime.csv")
# retain useful cols
crime <- crime[,c("category", "month", "location.latitude", "location.longitude")]

Spatial Join Crime to MSOAs

crime$crime <- 1 # crime count for aggregating

# convert to spatial object
crime <- st_as_sf(crime, coords = c("location.longitude",
                                    "location.latitude"),
                  crs = "+init=epsg:4326")
  
crimePoints <- crime %>% 
  st_join(livSHP, left = F) %>% # spatial join with livSHP
  na.omit() # remove na produced by join (points outside lsoas)

Create MSOA Polygons with Crime

# spatial join crime/shp keep polys, sum crime
crimePoly <- livSHP %>% 
  st_join(crime, left = F) %>% 
  group_by(month, MSOA11CD) %>% 
  summarise(crime = sum(crime))

# save data as this is expensive
save(crimePoly, file = "./shapefiles/crimePoly.RData")

Create Quantiles for Crime Polygons

# load in crimePoly
load(file = "./shapefiles/crimePoly.RData")

no_classes <- 6 # specify 6 bins
labels <- c() # empty vector

# find quantiles for each msoa
quantiles <- quantile(crimePoly$crime, 
                      probs = seq(0, 1, length.out = no_classes + 1))

# define custom labels, remove decimal etc
labels <- c()
for(idx in 1:length(quantiles)){
  labels <- c(labels, paste0(round(quantiles[idx + 1], 0)))
}

# remove final label
labels <- labels[1:length(labels)-1]

# new variable in dataset with quantiles
crimePoly$crime_quantiles <- cut(crimePoly$crime, 
                                     breaks = quantiles, 
                                     labels = labels, 
                                     include.lowest = T)

Default Map ggplot2 Theme Function

## function with my preferred ggplot2 theme settings (for maps)
theme_map <- function(...) {
  theme_minimal() +
  theme(
    text = element_text(family = "Ubuntu Regular", color = "#22211d"),
    axis.line = element_blank(),
    axis.text.x = element_blank(),
    axis.text.y = element_blank(),
    axis.ticks = element_blank(),
    axis.title.x = element_blank(),
    axis.title.y = element_blank(),
    panel.grid.minor = element_line(color = "#ebebe5", size = 0.2),
    panel.grid.major = element_line(color = "#ebebe5", size = 0.2),
#    panel.grid.minor = element_blank(),
    plot.background = element_rect(fill = "#f5f5f2", color = NA), 
    panel.background = element_rect(fill = "#f5f5f2", color = NA), 
    legend.background = element_rect(fill = "#f5f5f2", color = NA),
    panel.border = element_blank(),
    ...
  )
}

Appendix II: Template

This template was largely produced by myself but takes aspects of the epuRate R Markdown template by Holtzy. Fonts are inspired by the Typora theme Ursine.

Thanks mostly to Yihui for his contributions to R Markdown, making it what it is today.